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Abstract 

Background: The filamentous fungus Aspergillus fumigotus has become the most important airborne fungal 
pathogen causing life-threatening infections in immuno-compromised patients. Recently developed 
high-throughput transcriptome and proteome technologies, such as microarrays, RNA deep-sequencing, and 
LC-MS/MS of peptide mixtures, are of enormous value for systematically investigating pathogenic organisms. In the 
field of infection biology, one of the priorities is to collect and standardise data, in order to generate datasets that 
can be used to investigate and compare pathways and gene responses involved in pathogenicity. The "omics" era 
provides a multitude of inputs that need to be integrated and assessed. We therefore evaluated the potential of 
paired-end mRNA-Seq for investigating the regulatory role of the central mitogen activated protein kinase (MpkA). 
This kinase is involved in the cell wall integrity signalling pathway of A fumigatus and essential for maintaining an 
intact cell wall in response to stress. 

Results: The comparison of the transcriptome and proteome of an A fumigatus wild-type strain with an mpkA null 
mutant strain revealed that 70.4% of the genome was found to be expressed and that MpkA plays a significant role 
in the regulation of many genes involved in cell wall remodelling, oxidative stress and iron starvation response, and 
secondary metabolite biosynthesis. Moreover, absence of the mpkA gene also strongly affects the expression of 
genes involved in primary metabolism. The data were further processed to evaluate the potential of the mRNA-Seq 
technique. We comprehensively matched up our data to published transcriptome studies and were able to show 
an improved data comparability of mRNA-Seq experiments independently of the technique used. Analysis of 
transcriptome and proteome data revealed only a weak correlation between mRNA and protein abundance. 

Conclusions: High-throughput analysis of MpkA-dependent gene expression confirmed many previous findings 
that this kinase is important for regulating many genes involved in metabolic pathways. Our analysis showed more 
than 2000 differentially regulated genes. RNA deep-sequencing is less error-prone than established 
microarray-based technologies. It also provides additional information in A fumigatus studies and as a result is more 
suitable for the creation of extensive datasets. 
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Background 

The development of new transcriptome techniques, 
coupled with the ability of creating and analysing huge 
databases, has paved the way for system biology's 
"Golden Age". The generation and processing of high- 
throughput data makes possible the investigation of the 
genome, transcriptome, proteome and metabolome of 
living organisms in an overall context. This is particu- 
larly useful for gaining a deeper understanding of patho- 
genic microorganisms [1]. 

The above holds also true for Aspergillus fumigatus, 
which can be regarded as the most important air- 
borne fungal pathogen. This fungus can cause a life- 
threatening disease, invasive aspergillosis (IA), in 
immuno-compromised patients. Patients that suffer 
from IA still have a low life expectancy. This is due 
to the lack of reliable diagnostic tools and of efficient 
antifungal therapies [2,3]. 

Transcriptome analysis has just recently been applied 
to A. fumigatus. The first genome-sequencing project, 
applied to the isolate Af293 and coupled with the first 
transcriptome data available, has been published [4] and 
a second genome, from the A. fumigatus strain A1163, is 
now available [5]. In the last few years, various micro- 
array platforms have been designed for global transcrip- 
tome analyses of A. fumigatus. These platforms have all 
been developed independently of each other using 
defined sets of primers and probes. However, developing 
microarray platforms for A. fumigatus has already been 
made outdated by the advent of new technologies based 
on cDNA sequencing. 

To show the potential possibilities of mRNA-Seq to 
accelerate Aspergillus research and to deepen our 
knowledge about the regulatory function of MpkA, we 
analysed A. fumigatus wild-type and the corresponding 
mutant lacking the mpI<A gene by cDNA sequencing 
and 2D gel-based proteomics. The gene mpkA codes 
for the mitogen activated protein kinase MpkA which 
acts on the A. fumigatus cell wall integrity (CWI) sig- 
nalling pathway [6,7]. The AmpI<A mutant strain is 
sensitive to cell-wall active compounds, oxidative 
stress and heat shock. The function of this gene is 
also related to polyamine metabolism and to the iron 
depletion response [8]. Moreover, it affects the expres- 
sion of several secondary metabolite gene clusters. 
The AmpkA mutant strain produces less gliotoxin 
than the wild type and this is a potent immunosup- 
pressant belonging to the epipolythiodioxopiperazine 
class of fungal toxins [8,9]. Our transcriptomic and 
proteomics data also allowed us to compare different 
omics-techniques. Analysis of the mRNA sequences 
obtained revealed unexpected novelties in the A. fumigatus 
genome. We also found that 30% of the transcriptionally 
active A. fumigatus genome has not been annotated in the 



canonical genome databases (e.g. CADRE, Broad, and 
AspGD) [10,11]. 

In addition, we addressed the question of how the data 
obtained with these new technologies differ from publi- 
cally available microarray data. To do this we compared 
published transcriptome studies carried out by using ei- 
ther microarrays or by applying mRNA-Seq with the 
mRNA-Seq dataset we generated. This comparison 
demonstrated that the data varies more among the dif- 
ferent microarray platforms than between mRNA-Seq 
experiments. Additionally, all the collected transcrip- 
tome data were compared with proteome data. Although 
the data obtained by mRNA-Seq was better correlated 
with the A. fumigatus proteome, there was still a rela- 
tively low correlation between the two different datasets. 

Results 

mRNA-Seq data set summary 

To gain a deeper insight into the regulatory circuits of 
the MAP-kinase MpkA in A. fumigatus we extracted 
RNA from a null mp1<A mutant and from the corre- 
sponding wild-type strain. We sequenced a total of six li- 
braries, three biological replicates for the AmpI<A strain 
and three for the wild-type strain. We obtained about 
263 million paired end reads. Of these, 193 million 
(74%) were uniquely mapped against the A. fumigatus 
A1163 genome. Using a stringent RPKM cut-off of 10 
reads per gene, 8172 genes (80% of the total annotated 
genes) were found to be transcribed in the wild-type 
strain we used. The coverage obtained was similar to 
that analysed in the dataset published by Gibbons et al. 
[12] for A. fumigatus (72% of the genes having an RPKM 
greater than 10). It was also similar to an mRNA-Seq 
study in A. oryzae [13], where the majority (83.4%) of 
the genome was covered by at least one read. Requiring 
coverage of at least 10 overlapping paired-end reads per 
base-pair, 70.4% (-20 million base pairs) of the genome 
was found to be expressed. This is remarkable, consider- 
ing the fact that according to the A. fumigatus CADRE 
genome database, only -50% of the genome is potentially 
transcribable. These findings indicated that the A. fumigatus 
transcriptome has been greatly underestimated. 

UTRs prediction and annotation of new genes 

So far, most studies have used mRNA-Seq data to iden- 
tify transcriptional islands, which are consecutive areas 
of overlapping reads [13]. We additionally checked these 
Transcript Active Regions (TARs) for open reading 
frames. Since this approach is error prone, we employed 
ab initio gene prediction combined with mRNA-Seq 
data to improve the quality of gene prediction (Figure 1). 
This evidence-based gene prediction has the advantage 
over the classic ab initio prediction of finding gene 
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Figure 1 Workflow for evidence-driven gene prediction. The evidence is obtained by deep-sequencing which provides valuable hints like 
splice junctions or expressed regions. 



structures by making use of the mRNA-Seq expression 
profiles. This combination incorporates intron-junction 
information derived from intron-spanning reads and 
allows the prediction of not translated regions (UTR) in 
a systematic manner. 

Gene prediction still requires manual work to find ap- 
propriate adjustments for integrating data within a clas- 
sic gene-predicting framework. By using mRNA-Seq 
data, we identified 185 new transcripts coding for puta- 
tive proteins (Additional file 1: Table SI and Table S2). 
However, we could not exclude the possibility that the 
new transcripts are part of untranslated regions (UTRs) 
or even part of genes that need to be re-annotated. 
Among all the identified new transcripts, 44 of them 
have unknown functions, while 141 have orthologs in 
the genomes of other fungi. 

Incorporating mRNA-Seq data into gene prediction 
also allowed the identification of UTRs (Additional file 1: 
Table S2). However, UTR prediction remains a difficult 
task as only expressed regions provide evidence for UTR 
length. The method can therefore result in vague length 
predictions, especially for genes with low expression, due 
to insufficient coverage. Nevertheless, the identification 
of 5'UTR can help to find upstream open reading 



frames (uORFs) and led in our case to the identification 
of one potential uORF among the newly identified 
genes (Additional file 1: Table S2). However, knowledge 
of UTR sequences in A. fumigatus remains poor. This 
is mainly due to the lack of available expressed se- 
quence tag (EST) libraries. Our analysis highlighted the 
fact that some genes can have multiple or alternative 
transcriptional starting points. We found an average 
median length for 5'UTR of 308 and for 3'UTR of 97 
nucleotides based on 9912 UTR sequences for each 
gene considered (Additional file 2: Figure SI). 

Comparison of transcriptome technologies 

To better evaluate the performance of microarrays and 
mRNA-Seq, differentially expressed transcripts were 
investigated. We compared transcript levels in both the 
wild-type strain and the AmpI<A mutant strain by repeat- 
ing the experiment performed by Jain et al. [8] . 

Transcriptome data obtained by mRNA-Seq identified 
2046 differentially regulated genes (log 2 fold change >2 
and likelihood >0.99). For comparison, using the Febit 
microarray platform, 653 differentially expressed genes 
were identified (fold change >1.5) (see Additional file 1: 
Table S3, Table S4, and Table S5). KEGG analysis 
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showed that common gene categories were enriched by 
the differentially expressed genes identified in the two 
datasets (Figure 2 and Additional file 1: Table S6, Table 
S7, and Table S8). The main difference between data sets 
was that mRNA-Seq gave a more detailed picture. The 
number of enriched genes for a specific category was 
substantially higher showing, for example, complete sec- 
ondary metabolite gene clusters to be differentially regu- 
lated (e.g. the gene cluster bordered by genes 
AFUB_000840 and AFUB_000750). It is noteworthy that 
genes enriched in the KEGG category 0.1. L, i.e. genes 
involved in metabolic pathways, were differentially regu- 
lated in the mRNA-Seq data, but in contrast, were not 
enriched in the data set obtained using microarrays. The 
majority of these genes, in particular genes putatively 
involved in fatty acid metabolism, were down-regulated 
in the AmpIcA mutant. This suggests that the RNA deep 
sequencing technique is more sensitive in finding differ- 
entially expressed genes than the microarray technique. 
This might be explained by the greater dynamic range 
covered by the RNA technique [14,15]. 

For a more detailed analysis, we computed log fold 
changes for the AmpI<A mutant vs the wild-type of the 
differentially expressed genes based on both microarray 
and mRNA-Seq data, including two biological replicates 
for each (Figure 3). Correlation of log fold changes of 



expressed genes was highest within the same technology. 
Correlation within mRNA-Seq replicates (Pearsons cor- 
relation coefficient r = 0.96) was higher than that be- 
tween the two microarray replicates (r = 0.53). This low 
correlation was mainly due to genes that only slightly 
changed their expression between conditions and are 
therefore more affected by the technological noise 
resulting from the technology used. A possible explan- 
ation is the low sensitivity of microarrays for low 
expressed genes that results in a smaller dynamic range 
compared to sequence data [15]. 

When comparing the global transcript fold changes 
between mRNA-Seq and microarrays the correlation 
amounted to 0.14-0.19 considering all expressed genes 
(Figure 3). This was not surprising because most genes 
are not expected to be differentially expressed, which 
consequently reduces the correlation. Since these genes 
are usually not of interest, we selected genes having the 
highest absolute fold change in both studies (10% quan- 
tiles of the total number of genes, intersection from 
mRNA-Seq and microarrays). The correlation value for 
this subset resulted in a large increase of both the 
within-array correlation (from 0.53 to 0.83) and the 
cross-technology correlation (between mRNA-Seq and 
microarrays, from 0.19 to 0.41). However, the 0.41 cor- 
relation is still lower than those found in other studies 
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0.1.1 Metabolic pathways 1.2.8 

0.1.2 Biosynthesis of secondary- metabolites 1.3.1 

1.1.1 Glycolysis Gluconeogenesis 1.3.16 

1.1.10 Pyruvate metabolism 1.3.3 

1.1.12 Propanoate metabolism 1.3.4 

1.1.13 Butanoate metabolism 1.3.5 
1 .1 .9 Amino sugar and nucleotide sugar met. 1 .3 .9 
1.2.1 Penicillin and cephalosporin biosynthesis 1.5.1 
1.2.7 Oxidative phosphorylation 1.5.12 



Methane metabolism 1.5.4 
Nitrogen metabolism 1.5.5 
Fatty a cid bio synthe sis 1.5.8 
Glycerophospholipid metabolism 1 .6.1 

Arachidonic acid metabolism 1 .6.5 
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Steroid biosynthesis 2.4.5 
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Figure 2 Comparison of KEGG-enriched categories. Enriched categories were obtained by analysing two different transcriptome analyses 
performed by mRNA-Seq (dark grey) and microarray (light grey), both performed to investigate genes expressed differently in a wild type 
compared to a AmpkA mutant strain. 
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Figure 3 Comparison of log 2 fold changes. Analysis of wt vs AmpkA mutant among and across both technologies (microarrays and 
mRNA-Seq). Genes are coloured blue if they were up-regulated (being in the 10% quantile of the genes with the highest fold-change) in both 
technologies (181 DEGs), red if down-regulated in both technologies (141), green if down-regulated on the arrays and up-regulated for sequence 
platform (70) and cyan for the opposite case (141). 



(e.g. Marioni et al. [16] found a correlation of 0.73 by 
comparing log fold changes between Affymetrix arrays 
and lllumina sequencing in experiments performed on 
human samples). 

With the purpose of having additional values for 
evaluating the transcriptome data from the two technical 
platforms, qRT-PCR was performed in order to quantify 
transcript levels of genes that showed different expres- 
sion levels in both studies (Figure 4, Additional file 1: 



Table S9 and Additional file 3: Figure S2). Among the 15 
transcripts investigated, 10 confirmed the direction of 
expression change detected by both transcriptome ana- 
lysis techniques (with higher significance for mRNA-Seq 
data). Surprisingly, qRT-PCR did not reveal significant 
changes in some genes that showed opposite regulation 
in the analyses of the two techniques. For instance, qRT- 
PCR confirmed the mRNA-Seq results for the sidD gene, 
coding for a non-ribosomal peptide synthetase involved 
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Figure 4 Differentially expressed gene and protein abundances. A) qRT-PCR used to check genes that showed differential expression 
between wt and AmpkA strains during microarray, mRNA-Seq and 2D proteome analysis. The relative amount of transcripts was normalised by 
setting the value for each wt gene to 1; B) table listing the genes investigated with their respective fold change {AmpkA vs wt strains) during 
microarray, mRNA-Seq and 2D proteome analysis. 



in siderophore production (AFUA_3G03420). Micro- 
array data showed that genes expressed during low iron 
stress response were down-regulated in the AmpkA mu- 
tant strain. Previous experiments reversed microarray 
results, demonstrating that MpkA negatively regulates 
genes involved in siderophore production, indicating 
that many genes involved in iron depletion response are 
up-regulated in the AmpJ<A mutant strain (e.g. sidA or 
sidD) [8]. Current mRNA-Seq data confirmed our ex- 
perimental hypothesis, by showing that genes involved 
in the iron starvation response are, in fact, up-regulated. 

In order to study absolute transcript abundance rela- 
tionships within and across various microarray and 



mRNA-Seq platforms, we performed a comprehensive 
comparison. Analysis of transcriptome techniques 
included five microarray studies (three different plat- 
forms) and two different experiments performed using 
mRNA-Seq analysis (created in different labs) (Table 1). 
In this comparison, we included one representative sam- 
ple corresponding to wild type for each experiment. For 
the study of Jain et al. [8] we also included two inde- 
pendent probe measurements in addition to the stand- 
ard one (i.e., combined) to assess within-platform 
expression. 

In total, the dataset compiled consists of seven wild- 
type measurements that were compared in a pair-wise 
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Table 1 Arrays analysed 



N° Technique Platform Used strains Growth conditions Study 



References Accession 



Technology 



1 Microarray TIGR 

(30kv1) 



Af293 



2 Microarray Custom Af293 
(Roche) 



3 Microarray Febit 



CEA17 



4 Microarray TIGR (22 K v3) ATTC 46645 



5 Microarray Febit 



6 mRNA-Seq lllumina 



7 mRNA-Seq lllumina 



CEA17 



ATTC 46645 
Af293 CEA10* 



CEA17 



8 Proteomic 2D-DIGE ATTC 46645 



9 Proteomic 2D-DIGE CEA17 



5 x 10 6 conidia per ml, 
inoculated in complete 
media, for 17 hours, 
at 30-37°C, 160 rpm 

10 8 conidia per ml, 
inoculated in potato 
dextrose media, for 
24 hours, at 28°C, 
160 rpm 

10 6 conidia per ml, 
inoculated in minimal 
media, for 16 hours, 
at 37°C, 200 rpm 

10 7 conidia per ml, 
inoculated in Brian's 
media, for 14 hours, 
at 37°C, 150 rpm 

10 6 conidia per ml, 
inoculated in minimal 
media, for 16 hours, 
at 37°C, 200 rpm 

10 7 conidia per ml, 
inoculated in Brian's 
media, for 14 hours, 
at 37°C, 150 rpm 

10 6 conidia per ml, 
inoculated in minimal 
media, for 16 hours, 
at 37°C, 200 rpm 

10 6 conidia per ml, 
inoculated in minimal 
media, for 16 hours, 
at 37°C, 200 rpm 

10 6 conidia per ml, 
inoculated in minimal 
media, for 16 hours, 
at 37°C, 200 rpm 



Temperature shifts [4] 
(30 to 37°C and 
30 to 48°C) 

Effects on growth [18] 
in the presence of TSA 



Comparison among [25] 
wild type, AgprC, and 
AgprD mutant 

Comparison between [17] 
planktonic and 
biofilm growth 

Comparison between [8] 
wild type and AmpkA 
mutant 



Comparison between [12] 
planktonic and 
biofilm growth 

Comparison between This study 
wild type and 
AmpkA mutant 

Design the A. fumigotus [21] 
mycelial and 
mitochondrial proteome 
map 

Comparison between 
wild type and AmpkA 
mutant 



ArrayExpress: 
E-MEXP-332 



DNA amplicon 
microarray 



GEO: GSE19682 60mer 

oligonucleotide 



Omnifung in situ 

oligonucleotide 



GEO: GSE1 9430 in situ 

oligonucleotide 



Omnifung 



On enquiry 



in situ 

oligonucleotide 



GAIIx 



ArrayExpress: GAIIx 
E-MTAB-1236 



Omnifung MALDI-TOF^OF 



This study On enquiry MALDI-TOF^OF 



GEO: gene expression omnibus (www.ncbi.nlm.nih.gov/geo); Omnifung (www.omnifung.hki-jena.de); ArrayExpress (http://www.ebi.ac.uk/arrayexpress/). *CEA17 
strain is the uracile auxotroph derived from the CEA10 strain. 



manner. A pair wise scatter plot was realised by comput- 
ing the Pearson and Spearman correlations for all pairs 
(Figure 5 and Additional file 1: Table S10 and Table 
Sll). Spearman is a rank correlation technique that can 
capture some non-linear relationships and is less prone 
to outliers produced by highly abundant genes. As for 
Marioni et al. [16] and Feng et al. [14] the correlation 
between replicates for the same experiment ranged from 
0.93 for the dataset of Bruns et al. [17] to 0.99 for the 
both mRNA-Seq studies, indicating high platform- 
reproducibility for microarrays as well as for sequencing 
data (data not shown). Therefore, it was sufficient to in- 
clude only one representative replicate for each platform 
to obtain concise comparisons. 

The transcriptome data reported showed that expres- 
sion values measured for transcripts were strongly scat- 
tered. Cross platform correlation was quite low, ranging 



from 0.12 ([17] vs [18]) to 0.52 ([4] vs [18]). Correlation 
between the independent measurements (probes A and 
B) as calculated for the dataset of Jain et al. [8] was only 
0.64, although all probes were exposed to the same sam- 
ples. This indicates a strong technological bias of micro- 
array data due to variation in hybridisation. At the same 
time, the correlation between the two Febit datasets for 
the combined probe measurements was quite high (0.89) 
pointing to a high conservation of technological bias in 
microarray platforms. In order to perform a general 
comparison between technologies, i.e., microarrays vs 
mRNA-Seq, the correlation of data within the same 
technology was used as a reference for the correlations 
between technologies. The highest correlation (0.68) 
among comparisons of mRNA-Seq data to microarray 
data was higher than the highest correlation between 
arrays (0.52, Roche vs. TIGR vl). This points to the fact 
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Figure 5 Pair-wise scatterplot of transcriptome datasets. Pair-wise scatterplot of the absolute RNA expression values measured by several 
microarray platforms as well as mRNA-Seq: upper triangle, the pair-wise scatterplots; lower triangle, the corresponding Pearson correlation r 
(Spearman correlation coefficient r s in brackets) for each pair. A high correlation (with the maximum being one) indicates great agreement of 
expression between the datasets compared. The green (red) coloured dots correspond to genes that are highly (little) expressed in the 
mRNA-Seq data and little (highly) expressed in the microarray study of Jain el al. (201 1) to check for bias due to the different technologies. 



that RNA sequencing data are less affected by techno- 
logical bias, which is confirmed by the scatter plot com- 
paring both mRNA-Seq datasets (Figure 5). 

Comparison between A. fumigatus transcriptome and 
proteome 

In order to analyse a possible correlation between tran- 
scripts and protein abundances as well as fold-changes, 



mRNA-Seq data, microarray data and the proteome data 
of the A fumigatus wild- type and AmpI<A mutant strain 
were compared. Looking at differential expression, 48 
different proteins were identified by 2D-gel electro- 
phoresis and subsequent MALDI-TOF / TOF, which 
showed a significant fold change in abundance (employing 
DIGE) in the AmpI<A mutant compared to wild type. 
Gene enrichment and pathway analysis revealed that 
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abundance of many proteins involved in metabolic pro- 
cesses and in oxidation/reduction processes is affected 
by MpkA (Additional file 1: Table S12 and Additional 
file 4: Figure S4). 

The regulation of protein levels was also compared 
with transcriptome data obtained from microarrays 
developed by Febit and used in the study of Jain et al. 
[8] and our mRNA-Seq data. As in earlier results 
(Figure 3, based on 9800 genes rather than 48) the 
fold change correlation between microarrays and 
mRNA-Seq was low (r = 0.14, Additional file 5: Figure 
S3). The measurements for transcriptome fold change 
often display low reproducibility because of difficulties 
in capturing the "true" fold change. For this reason, a 
low correlation between proteome and transcriptome 
technologies was assumed. However, the correlation 
between data obtained by microarrays and the prote- 
ome data was significantly higher (r = 0.28) than the 
correlation between mRNA-Seq data vs proteome 
(r = 0.19). As in the comparative analysis of the tran- 
scriptome data, the low correlation applies to all 
genes with a low fold change resulting in a higher 
relative impact of technological noise. Furthermore, it 
has already been shown by studies on human and 
mouse [19,20] that the relationship between transcrip- 
tome and proteome data is quite complex and pro- 
tein levels are greatly influenced by post-translational 
processing and inherent variations in stability. These 
factors result in low correlations. 

In order to study absolute protein abundances, we rea- 
nalysed the A. fumigatus 2D proteome map generated by 
Vodisch et al. [21], resulting in the identification of 312 
unique proteins. These proteins were also present in 
the different transcriptome data mentioned above 
(Additional file 1: Table S13 and Table S14). In A. fumigatus, 
the relative abundances of the proteome map correlated 
surprisingly well with mRNA-Seq expression levels 
(r = 0.36 for our data, and r = 0.5 for data from [12], 
Figure 5) as well as with most microarrays, in particular 
the Febit arrays (r = 0.47-0.49). It is also noteworthy 
that the genes on the scatter plots (Figure 6) are more 
equally distributed for mRNA-Seq vs proteome com- 
parison (left bottom corner) whereas, in particular for 
Febit arrays, a bias was observed towards genes that are 
highly expressed on the arrays but with low expression 
on the proteome-map. This is most probably due to 
cross-hybridisation. By contrast, this sort of bias was 
not observed in the NGS data. 

To check for genes that are potentially subject to post- 
transcriptional regulation, we selected genes that were 
very abundant in the mRNA-Seq data and rare in the 
proteome-map (coloured green in Figure 6) or the in- 
verse (coloured red). FunCat-analysis using FungiFun 
tools [22] indicated that the "green" genes {i.e. those 



genes that are highly expressed according to mRNA-Seq 
but little expressed according to proteome data) were 
not significantly enriched within a functional category 
(even after multiple correction), whereas the red ones {i.e. 
those genes that are little expressed according to mRNA- 
Seq but highly expressed according to microarray data) 
were over-represented within the tricarboxylic- acid path- 
way and amino acid metabolism. These processes con- 
stantly require large amounts of proteins that are 
therefore less subjected to transcriptional regulation. 

Discussion 

Over recent years, it has become evident that the 
eukaryotic regulatory machinery linking cause (for in- 
stance, stress) to effect (stress response) is very complex. 
It usually involves the interplay of several regulatory 
layers such as transcriptome, epigenome, and proteome 
that might be subject to independent regulatory 
mechanisms. To understand these interactions as a 
whole, integrated studies are required. 

We found that about 30% of the A. fumigatus genome 
expressed is located in assigned intergenic and intronic 
regions. This is most likely due to the presence of un- 
translated regions (UTRs) and transcripts that have not 
been classified so far. We investigated the potential of 
RNA deep-sequencing data to identify these genes by 
combining gene prediction (using the Augustus plat- 
form) and paired-end reads. This pioneer methodology 
helped us find almost 200 new transcripts with coding 
potential. Blast analysis revealed that more than half of 
these new transcripts harbour at least one ortholog in 
other Aspergillus species. This indicates the great poten- 
tial of this methodology to find new genes and to im- 
prove gene annotations. 

We tested RNA deep-sequencing to further investigate 
the role of MpkA in A. fumigatus. Previous studies 
demonstrated that this kinase is decisive for cell wall in- 
tegrity signalling pathway regulation ([7]; 2010). Further 
studies revealed that MpkA is also involved in the regu- 
lation of the oxidative stress response, secondary metab- 
olite production and siderophore biosynthesis during 
iron starvation [8]. Concerning the role of MpkA in A. 
fumigatus, mRNA-Seq data confirmed previous findings. 
Additionally, many genes involved in primary and sec- 
ondary metabolism were affected by MpkA. In particu- 
lar, genes putatively involved in amino acid biosynthesis 
were differentially expressed in the AmpIcA mutant 
strain. This finding explained the strong difference found 
in the soluble amino acid pool composition for this 
strain [8]. Moreover, almost 10% of genes differentially 
expressed in the AmpI<A mutant are putatively involved 
in primary metabolism (e.g. fatty acid metabolism and 
amino acid metabolism). These results confirmed that 
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Figure 6 Pair-wise comparison of all three technologies (microarrays, mRNA-Seq and 2D proteomics). The last column / row contain the 
comparison of the proteome-quantities of the proteome map [34] to RNA quantities measured by various microarray as well as mRNA-Seq 
platforms. 



one of the functions carried out by MpkA is to fine-tune 
the balance between stress response and energy consum- 
ing cellular processes. Considering the genes involved in 
the iron-deficiency stress response, it was found that dif- 
ferent techniques gave opposite results. It was previously 
reported that MpkA is important for regulating sidero- 
phore production by measuring an increased amount of 
siderophores in the AmpkA mutant strain during iron 
starvation [8]. This illustrates that mRNA-Seq is more 
reliable than other transcriptome techniques. However, 



the differences found between mRNA-Seq and qRT-PCR 
showed that it would be good practice if results obtained 
by sequencing were confirmed by applying ad hoc 
experiments. 

Microarray data showed that genes belonging to sev- 
eral A. fumigatus secondary metabolite biosynthesis gene 
clusters were differentially expressed in the AmpkA mu- 
tant {e.g. gliotoxin and pseurotin A). Secondary metabol- 
ite biosynthesis gene clusters are normally characterised 
by the presence of polyketide synthases (PKSs) or non- 
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ribosomal peptide synthetases (NRPSs). Analysis of the 
A. fumigatus A1163 genome revealed the presence of 17 
NRPSs, 13 PKSs and three hybrid NRPS/PKS enzymes 
[23]. 

Among them, 11 gene clusters have been found to be 
differentially regulated in the AmpI<A mutant. Further- 
more, in comparison with the microarray studies, the 
higher sensitivity of the mRNA-Seq technique led to the 
identification of almost all genes of a cluster to be differ- 
entially regulated. The mRNA-Seq approach is therefore 
superior to microarrays for studies on the global regula- 
tion of secondary metabolite biosynthesis. 

The second question that we addressed concerned the 
comparability of mRNA-Seq data with published tran- 
scriptome data. We compared different technologies to 
investigate the A. fumigatus transcriptome (microarrays, 
mRNA-Seq and qRT-PCR). Information obtained by 
analysing these technologies were compared with a com- 
prehensive set of proteome data obtained by 2D DIGE 
analysis followed by MALDI-TOF-MS / MS-based pro- 
tein detection. The comprehensive dataset compiled in 
this study constitutes a basis for future studies with the 
aim of shedding light on the relationship between prote- 
ome and transcriptome data, in order, for example, to 
understand better post-transcriptional regulation on a 
global scale. 

With regard to A. fumigatus, the results of whole gen- 
ome transcriptome analyses based on microarrays were 
highly dependent on the platform used. Low consistency 
of microarray gene expression data can have multiple 
reasons, such as platform effects [24], different media, 
RNA extraction protocols, or biological variability. How- 
ever, hybridisation seems to have the most significant 
impact. Hybridisation strongly depends on the probes, 
which were used to develop the microarrays. In the case 
of A. fumigatus, they have been designed independently 
from each other for the different platforms. The hybrid- 
isation bias seems to be highly conserved within the 
same platform, as shown by the high level of agreement 
between the studies of Jain et al. [8] and Gehrke et al 
[25]. When considering the comparison between Jain 
et al. [8] microarray data with the mRNA-Seq data 
produced in this study, the correlation is relatively low 
(r = 0.35). Moreover, comparing the two datasets 
obtained with the same experimental proceeding by 
Bruns et al. [17] using microarrays and by Gibbons et al. 
[12] using mRNA-Seq, once again showed quite a low 
correlation (r = 0.22). The global comparison demon- 
strated that a higher correlation was found between the 
two mRNA-Seq datasets. This finding demonstrates that 
mRNA-Seq gives the highest level of comparability in 
terms of gene expression levels in A. fumigatus. 

Comparative analysis of the A. fumigatus intracellular 
2D proteome map from Vodisch et al. [21] revealed that 



protein abundances correlated well with transcriptome 
abundances measured with both microarray (r = 0.34- 
0.50) and mRNA-Seq techniques (r = 0.36-0.50). These 
values were higher than the ones observed by Albrecht 
et al. [26], who compared changes in A. fumigatus 
microarray and proteome data after heat shock induc- 
tion. Previously, Foss et al. [27] calculated a correlation 
between proteome and transcriptome of r = 0.186 in 
Saccharomyces cerevisiae based on 278 proteins detected 
by LC-MS / MS analysis. Another study performed by 
Ghazalpour et al. [19] using LC-MS / MS-based quanti- 
fications of 486 mouse proteins, resulted in an average 
correlation of r = 0.27 (using Affymetrix microarray). 
However, although the results showed a better correl- 
ation between A. fumigatus proteome and transcriptome 
data, there was still a relatively low correlation between 
the two different datasets. This is most likely caused by 
a combination of several biases based on technological 
and biological effects. 

Conclusions 

The potential of next generation sequencing to investi- 
gate the regulation of the A. fumigatus transcriptome 
was demonstrated. We found that the transcriptional po- 
tential of A. fumigatus was underestimated. Almost 70% 
of the genome was found to be actively transcribed. Se- 
quence information obtained by mRNA-Seq was also 
used for gene prediction. We showed that the incorpor- 
ation of transcriptome-based assembly can be very help- 
ful for improving or confirming gene annotation. This 
methodology allowed us to identify 185 new transcripts 
never reported in the A. fumigatus A1163 strain. Com- 
pared to microarray data, we identified three times more 
differentially regulated genes in the AmpI<A mutant strain 
compared with the wild type. Comparative transcriptome 
and proteome studies pointed out that MpkA plays an 
important role not only in maintaining the cell wall 
structure under stress condition but also in affecting 
genes involved in primary and secondary metabolism. 

We compared data obtained by mRNA-Seq with those 
obtained by established microarray-based technologies 
for expression profiling. mRNA-Seq was found to be less 
error-prone and more suitable for the realisation of ex- 
tensive datasets that can be potentially created by differ- 
ent groups under different lab conditions. Consequently, 
data produced so far using different microarray plat- 
forms, can only be considered by focusing on the highly 
differentially expressed genes. 

Methods 

cDNA library construction and sequencing 

The A. fumigatus strain CEA17 KU8 ° strain was used for 
all experiments. The CEA17 KU8 ° strain was derived from 
the CEA17 wild-type strain [28]. 
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Total RNA was isolated from wild-type and AmpI<A 
strains cultured in Aspergillus minimal medium (AMM) 
for 16 h at 37°C [8], using the Qiagen RNeasy Plant Mini 
kit (Qiagen, Germany), according to the manufacturers 
instruction. Three biological replicates for each strain 
were collected. Total RNA was used for Illumina next- 
generation sequencing [29]. For library preparation 5 ug 
of total RNA per sample were processed using Illumina 
mRNA-Seq sample prep kit (RS-100-0801) following the 
manufacturers instruction. The cDNA libraries were 
sequenced using a Genome Analyser (GAIIx) in a 
paired-end approach with 2 x 36 cycles resulting in 
paired reads with a length of 36 nucleotides per read. 
Each library was sequenced on a single lane and ended 
up with around 30-40 million reads per sample. Se- 
quence data were extracted in FastQ format and used 
for further analysis. 

Mapping and normalising of transcriptome reads 

All reads were mapped using TopHat 1.2.0 [30] against 
the A.fumigatus A1163 genome (which is a derivative of 
the A. fumigatus CEA17 strain) retrieved from Ensemble 
release 9. Parameters were set according to preliminary 
investigations, using a minimum intron length of 30 and 
a maximum of 4000. In agreement with the protocol 
used, from preliminary mapping a mean inner distance 
between mate pairs of 307 was found with a standard 
deviation of 110. To avoid multiple hits for a single read, 
the maximum multi-hit option was set to one. 

Transcript expression levels were normalised by 
counting the number of reads per kilobase of exon re- 
gion per million mapped reads (RPKM) [31]. 

To detect genes differentially expressed in the wild 
type and the AmpI<A mutant, we applied the R package 
baySeq [32] which takes advantage of the three bio- 
logical replicates for each condition. RPKM and differen- 
tially expressed genes were determined using the A. 
fumigatus A1163 genome annotated by CADRE as a 
reference. 

Differentially expressed genes were categorised using 
GO enrichment, KEGG enrichment, and FunCat analysis 
using the FungiFun platform [22]. 

UTRs prediction 

With the purpose of identifying putative new transcripts, 
we performed new gene annotation employing Augustus 
ab initio prediction [33] combined with hints generated 
from the mRNA-Seq reads. We present a new annota- 
tion including UTRs for A. fumigatus A1163. These 
hints consisted of intron junctions that were identified 
by TopHat as well as exon-parts revealed with the 
IRanges package which is part of the Bioconductor col- 
lection. The workflow for new gene annotation is 
reported in Figure 1. 



Microarray data 

The transcriptome and proteome data analysed are 
summarised in Table 1. The datasets were obtained in a 
pre-processed state i.e. background-corrected and nor- 
malised. Since the different microarray platforms and 
even the different A. fumigatus genomes have used dif- 
ferent Gene-IDs (e.g. accession numbers starting with 
AFUB for the A1163 strain, and accession numbers 
starting with AFUA for the Af293 strain), all data were 
mapped according to the AFUA annotation, using en- 
semble mapping, to achieve comparability. The data 
were imported and analysed in the R / Bioconductor 
software environment employing the packages biomaRt 
and Biostrings. 

Real-Time PCR 

The qRT-PCR experiments to determine the amounts of 
selected putatively differently expressed genes were per- 
formed with StepOnePlus Real-Time PCR System (Ap- 
plied Biosystems), using myTaq HS Mix 2x (Bioline) and 
Evagreen (Biotium). For each transcript analysed more 
primers were tested in tandem, in order to obtain primer 
efficiency close to 100% in all cases. To validate primers, 
efficiency standard curves were realised considering 
seven serial dilutions of genomic DNA of A. fumigatus 
in three replicates. All the primers selected for the qRT- 
PCR analysis are shown in Additional file 1: Table S15. 
The A. fumigatus actinl gene (AFUA_6G04740) was 
chosen as the normalising gene. The cDNA was gener- 
ated using Revert Aid Premium (Fermentas). As negative 
controls, a vial was prepared without reverse transcript- 
ase, and another one was prepared without cDNA. 

Assessing differential expression 

To correlate the mRNA-Seq data with the corresponding 
presence of proteins, the mRNA-Seq data set was com- 
pared with the proteome data published by Vodisch 
et al. [21]. For our study, the normalised protein spot 
abundances on the 2D gels were reanalysed using the 
most recent version of the software Delta 2D 4.3 
(Decodon, Germany). 

The DIGE (difference in-gel electrophoresis) technique 
was used to analyse protein samples of wild type and 
AmpI<A mutant mycelium cultivated in AMM for 16 h 
at 37°C. Twelve samples (three independent biological 
replicates and two technical replicates for each strain as 
well as one mixed internal standard), were labelled with 
CyDye minimal dyes according to the manufacturers 
protocol with slight modifications (GE Healthcare Bio- 
Sciences, Munich Germany). We then labelled 50 ug 
total protein of each sample with 300pmol of CyDye 
DIGE fluorophores (dissolved in dimethyl formamide). 
Samples obtained from the wild-type and from the 
AmpkA mutant strain were labelled with either Cy3 or 
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Cy5. A pool of all six samples (3 x wt and 3 x AmpI<A) 
was prepared, labelled with Cy2, and used as a global in- 
ternal standard. We carried out 2D gel processing, spot 
screening and MALDI-TOF / TOF analysis with an 
ultrafleXtreme mass spectrometer (Bruker Daltonics, 
Germany), as previously described [34] with slight 
modifications. FlexControl 3.3 software was used for 
data collection and flexAnalysis 3.3 for spectra analysis 
and peak list generation (Bruker Daltonics, Germany). 
Peptide mass fingerprint (PMF) and peptide fragmenta- 
tion fingerprint (PFF) spectra were submitted to the 
MASCOT 2.3 server (Matrix Science, U.K.) searching 
the recent version of the NCPInr database but restricted 
to the fungi. Mascot search parameters were the following: 
fixed modification of cysteine thiols to S-carbamidomethyl 
derivatives, variable methionine oxidation, up to one 
missed cleavage and a peptide mass tolerance of 100 ppm. 

Accession number 

The RNA-Seq data were deposited in ArrayExpress 
(ArrayExpress accession: E-MTAB-1236). 
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Additional file 1: Tables.xls (Excel file which includes all the cited 
supplementary tables). 

Additional file 2: Figure SI. Length distribution of untranslated regions 
(UTRs). The analysis was based on 9912 3' and 5' UTR sequences. The red 
vertical lines indicate the average median length of the 5'UTRs (308) and 
the 3'UTRs. 

Additional file 3: Figure S2. Comparison of delta_delta_Ct values 
obtained by qRT-PCR analysis (see also Additional file 1: Table S9). Values 
obtained by qRT-PCR analysis where compared to log 2 fold changes (wt 
vs AmpkA) obtained by microarrays and mRNA-Seq based on 14 genes. 
Both technologies seem to agree with the qRT-PCR-data with an average 
Pearson correlation of r = 0.75 for microarrays and 0.55 for mRNA-Seq 
(Spearman correlation coefficient r s in brackets). 

Additional file 4: Figure S4. 2D gel electrophoresis of protein extracts. 
Total proteins were extracted from A. fumigatus wild type (Cy5, green) 
and AmpkA mutant strain (Cy3, purple). Proteins were stained with the 
difference in gel electrophoresis (DIGE) labelling technique. Total proteins 
were separated in a pH gradient of 3-11 (nonlinear). 

Additional file 5: Figure S3. Pair-wise scatterplot of all three 
technologies. Comparison of three different technologies that is 
2D-DIGE proteomic, mRNA-Seq and microarray, to detect differentially 
expressed genes and proteins between the Ampka strain and wild-type 
strain, based on log fold changes of 94 entries. The overall correlation 
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